Show the code
pacman::p_load(jsonlite, tidygraph, ggraph, dplyr,
visNetwork, graphlayouts, ggforce, knitr, kableExtra,
skimr, tidytext, tidyverse, igraph, ggplot2, RColorBrewer, wordcloud)This exercise will attempt to Question 1 of Mini Challenge 2 from Vast Challenge 2023. This exercise focuses on utilizing visual analytics to identify anomalies in the business groups within the knowledge graph.
In the code chunk below, necessary R packages are installed to import data, data preparation and visualization.
In the code chunk below, fromJSON function is used to import MC3 data into R environment.
The code chunk below is used to extract links from mc3_data and stored in the tibble data frame called “mc3_edges”.
The code chunk below is used to extract nodes from mc3_data and stored in the tibble data frame called “mc3_edges”.
In the code chunk below, datatable() functions is used to display mc3_edges and mc3_nodes tibble data frame as an interactive table.
In this code chunk, unnest_tokens function is used to split workds in “product_services” column.
Tidytext package function called stop_words is used to remove stop words.
In the code chunk below, a new node data table is prepared by using the source and target field of mc3_edges data frame.
In this code chunk, tidy graph data model is created.
Betweenness Centrality and Degree Centrality measurements are used to quantify each node’s interaction in the network and visualize the interactions. In the Code chunk below, mean and quartiles of betweenness centrality and degree centrality are calculated.
# Calculate mean and quartiles
mean_betweenness <- mean(mc3_df$betweenness_centrality, na.rm = TRUE)
quartiles_betweenness <- quantile(mc3_df$betweenness_centrality, probs = c(0.5,0.9,0.95,0.99), na.rm = TRUE)
max_betweenness <- max(mc3_df$betweenness_centrality, na.rm = TRUE)
mean_degree <- mean(mc3_df$degree_centrality, na.rm = TRUE)
quartiles_degree <- quantile(mc3_df$degree_centrality, probs = c(0.50,0.90,0.95), na.rm = TRUE)
max_degree <- max(mc3_df$degree_centrality, na.rm = TRUE)After that,the results are stored in the respective data frames.
# Create a data frame for betweenness centrality
betweenness_df <- data.frame(
Measure = c("Mean", "Quartile 0.50 (Median)", "Quartile 0.90", "Quartile 0.95", "Quartile 0.99", "Maximum"),
Value = c(mean_betweenness, quartiles_betweenness, max_betweenness)
)
# Create a data frame for degree centrality
degree_df <- data.frame(
Measure = c("Mean", "Quartile 0.50 (Median)", "Quartile 0.90","Quartile 0.95","Maximum"),
Value = c(mean_degree, quartiles_degree,max_degree)
)Betweenness Centrality Measurement | |
|---|---|
| Measure | Value |
| Mean | 8393.619 |
| Quartile 0.50 (Median) | 0.000 |
| Quartile 0.90 | 3.000 |
| Quartile 0.95 | 176.400 |
| Quartile 0.99 | 165050.000 |
| Maximum | 3849384.703 |
Degree Centrality Measurement | |
|---|---|
| Measure | Value |
| Mean | 1.287965 |
| Quartile 0.50 (Median) | 1.000000 |
| Quartile 0.90 | 2.000000 |
| Quartile 0.95 | 3.000000 |
| Maximum | 120.000000 |
An interactive plot showing top few nodes and edges with highest betweenness_centrality will be created.
For this, in this code chunk, edges with betweenness_centrality value over 100,000 are filtered and stored in new data frame.
In this code chunk, nodes with betweenness_centrality value over 100,000 are filtered and stored in new data frame.
Then, an interactive network graph that shows interaction of business groups with betweenness centrality value higher than 100,000 is plotted. The nodes are colored according to their business type.
nodes_top_betweenness <- nodes_top_betweenness %>% mutate(group = ifelse(is.na(type), "NA", type))
vis_plot <- visNetwork(nodes_top_betweenness, edges_top_betweenness,
main = "Betweenness Centrality")%>%
visIgraphLayout(layout = "layout_with_kk", smooth = FALSE,
physics = TRUE
) %>%
visNodes(size = 50, label=nodes_top_betweenness$label) %>%
visEdges(color = list(highlight = "lightgray"), arrows = 'to') %>%
visOptions(selectedBy = "type",
highlightNearest = list(enabled = TRUE,
degree = 1,
hover = TRUE,
labelOnly = TRUE),
nodesIdSelection = TRUE) %>%
visGroups(groupname = "Company", color = "lightblue") %>%
visGroups(groupname = "Company Contacts", color = "salmon") %>%
visGroups(groupname = "Beneficial Owner", color = "yellow") %>%
visGroups(groupname = "NA", color = "grey") %>%
visLegend(width = 0.1) %>%
visPhysics(repulsion = list(springlength = 50),
maxVelocity = 2,
solver = "forceAtlas2Based",
forceAtlas2Based = list(gravitationalConstant = -1000),
timestep = 0.25) %>%
visLayout(randomSeed=4)
vis_plotAn interactive plot showing nodes and edges in top 95% percentile of degree_centrality will be created.
In this code chunk, edges with degree_centrality value greater than or equals to 3 are filtered and stored in new data frame.
In this code chunk, nodes with degree_centrality value greater than or equals to 3 are filtered and stored in new data frame.
Then, an interactive network graph that shows interaction of business groups with degree centrality greater than or equal to 3 is plotted. The nodes are colored according to their business type.
nodes_top_degree <- nodes_top_degree %>% mutate(group = ifelse(is.na(type), "NA", type))
vis_plot <- visNetwork(nodes_top_degree, edges_top_degree,
main = "Degree Centrality")%>%
visIgraphLayout(layout = "layout_with_kk", smooth = FALSE,
physics = TRUE
) %>%
visNodes(size = 50, label=nodes_top_degree$label) %>%
visEdges(color = list(highlight = "lightgray"), arrows = 'to') %>%
visOptions(selectedBy = "type",
highlightNearest = list(enabled = TRUE,
degree = 1,
hover = TRUE,
labelOnly = TRUE),
nodesIdSelection = TRUE) %>%
visGroups(groupname = "Company", color = "lightblue") %>%
visGroups(groupname = "Company Contacts", color = "salmon") %>%
visGroups(groupname = "Beneficial Owner", color = "yellow") %>%
visGroups(groupname = "NA", color = "grey") %>%
visLegend(width = 0.1) %>%
visPhysics(repulsion = list(springlength = 50),
maxVelocity = 2,
solver = "forceAtlas2Based",
forceAtlas2Based = list(gravitationalConstant = -1000),
timestep = 0.25) %>%
visLayout(randomSeed=4)
vis_plotmean_counts <- mean(owner_count$count, na.rm = TRUE)
quartiles_counts <- quantile(owner_count$count, probs = c(0.5,0.95), na.rm = TRUE)
# Create a data frame for betweenness centrality
count_df <- data.frame(
Measure = c("Mean", "Quartile 0.50 (Median)", "Quartile 0.95"),
Value = c(mean_counts, quartiles_counts)
)
count_df %>%
kbl() %>%
kable_paper("hover", full_width = F) %>%
row_spec(0, bold = T, color = "white", background = "#D7261E")| Measure | Value | |
|---|---|---|
| Mean | 3.472574 | |
| 50% | Quartile 0.50 (Median) | 3.000000 |
| 95% | Quartile 0.95 | 5.000000 |